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Abstract 

Reconstruction of a dynamical system from a time series requires the selection of two 
parameters, the embedding dimension dg and the embedding lag r. Many competing 
criteria to select these parameters exist, and all are heuristic. Within the context 
of modeling the evolution operator of the underlying dynamical system, we show 
that one only need be concerned with the product dgT. We introduce an information 
theoretic criteria for the optimal selection of the embedding window d^, = dgT. For 
infinitely long time series this method is equivalent to selecting the embedding lag 
that minimises the nonlinear model prediction error. For short and noisy time series 
we find that the results of this new algorithm are data dependent and superior to 
estimation of embedding parameters with the standard techniques. 
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1 Reconstruction 



The celebrated theorem of Takens [1] guarantees that, for a sufficiently long 
time series of scalar observations of an n-dimensional dynamical system with 
a measurement function, one may recreate the underlying dynamics (up to 
homeomorphism) with a time delay embedding ^ . Unfortunately the theorem 
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^ Takens' theorem has many extensions and is described in various forms by sev- 
eral contemporary authors. We do not intend to dwell on the evolution of these 
fundamental results here. 
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is silent on exactly how to proceed when the data is limited and contami- 
nated by noise. In practice, time delay embedding is routinely employed as 
a first step in the analysis of experimentally observed nonlinear dynamical 
systems (see [2,3]). Typically, one identifies some characteristic embedding lag 
r (usually related to the sampling rate and time scale of the time series under 
consideration) and utilises de lagged version of the scalar observable for suffi- 
ciently large d^. In general, r is determined by identifying linear or nonlinear 
temporal correlations in the data and one will progressively increase dg until 
the results obtained are self consistent. 

In this paper we consider the problem of reconstructing the underlying dy- 
namics from a finite scalar time series in the presence of noise. We recognise 
that in general the quality of the reconstruction will depend on the length of 
the time series and the amount of noise present in the system. Employing the 
minimum description length model selection criteria we show that the optimal 
model of the dynamics does not depend on the choice of the embedding lag, 
only on the maximum lag {deT in the above scheme). We call that maximum 
embedding lag d^ '■= dgT, the embedding window, and show that for long 
noise-free time series the optimal minimises the one-step model prediction 
error. For short or noisy data, the optimal value of dyj is data dependent. To 
estimate the one-step model prediction error and cZ^ we apply a generic local 
constant modeling scheme to several computational examples. We show that 
this method proves to be consistent and robust, and the results that we obtain 
capture the salient features of the underlying dynamics. Finally, we also find 
that in general there is no single characteristic time lag r. Generically, the 
optimal reconstruction may be obtained by considering the lag vector 



(ri,T2,...,Tfc) (1) 

where < < Tj+i < d^, ^ ■ 

The textbooks [2,3] contain copious detail on the estimation of de and r. We 
briefiy review only the most relevant developments here. 

Often, the primary aim of time delay embedding is the estimation of dynamic 
invariants. In these instances, one may estimate r with a variety of heuristic 
techniques: usually autocorrelation, pseudo-period or mutual information. One 
then computes the dynamic invariant for increasing values of de until some sort 
of plateau onset occurs (see [5] and the references therein). For estimation 
of correlation dimension, dc, it has been shown that dg > d^ is sufficient 
[6]. However, for reconstruction of the underlying dynamics this is not the 
case. Alternatively, the method of false nearest neighbours [7] and its various 
extensions apply a topological reasoning: one increases de until the geometry 

^ This is the so called "variable embedding" described in [4] and elsewhere. 



2 



of the time series does not change. 

We note that several authors have speculated on whether the individual pa- 
rameters de and r, or only their product dgT, is significant. For example, Lai 
and Lerner [8] provide an overview of selection of embedding parameters to es- 
timate dynamic invariants (in their case, correlation dimension). They impose 
some fairly generous constraints on the correlation integral and use these to 
estimate the optimal value of de and r. Their numerical results from long clean 
data imply that correct selection of r is crucial, selection of d^ (and therefore 
de) is not. Conversely, utihsing the BDS statistic [9], Kim and co-workers [5] 
concluded that the crucial parameter for estimating correlation dimension is 

dyj. 

Unlike these previous methods, the question we consider is: "What is the op- 
timal choice of embedding parameters to reconstruct the underlying dynamic 
evolution from a time series?" In answering this question we conclude that 
only the embedding window d^ is significant, selection of optimal embedding 
lags is, essentially, a modeling problem [4]. Clearly, successful reconstruction 
of the underlying dynamics will depend on ones' ability to identify any under- 
lying periodicity (and therefore r). The results of this paper shows that it is 
possible to estimate the optimal value of du,, and subsequently use this optimal 
value to derive a suitable embedding lag r. However, as previous authors have 
observed in many examples, estimation of r for nonlinear systems is model 
dependent [4] (and may even be state dependent). 

In the following section we introduce our main result and the rational for the 
calculations that follow. Section 3 demonstrates the application of this method 
to several test systems, and section 4 studies the problem of modelling several 
experimental time series. In section 5 we conclude. 



2 A modeling peiradigm 



Let (f) : Ai — > Ai be the evolution operator of a dynamical system, and h : 
Ai — > R a differentiable observation function. Through some experiment 
we obtain the time series {h{Xi),h{X2), . . . ,h{XN)}. Denote Xi = h{Xi). 
Takens' theorem [1] states that for some m > the mapping g 



Xi I > {p^i-i ^i—l-i ^i—2i ■ ■ ■ 1 -^i— m— l) (2) 
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is such that the evolution of g{xi) — {xi, Xj-i, . . . , Xi-m-i) ^ is homeomorphic 
to 

We will generahse the embedding map (2) and consider g as 

Xi 1-^ {aiXi, a2Xi-i, a3Xi-2, aaXi-d-i). (3) 

The objective of a successful embedding is to find a — [ai, 02, . . . , a^] where 
Oj e {0, 1}. Note that g{xi) is simply the subspace projection of g{xi) onto a, 

g{xi) = Froi^g{xi). 

The embedding is completely defined by a G {0, l}'^ and we wish to make the 
best choice of a and d, which we write {a,d). Note that, in general one could 
consider a e R'^'='^. We restrict ourselves to {0, l}*^ as the more general case is 
concerned with the optimal model of the dynamics rather than the necessary 
information. For a uniform embedding with embedding parameters de and r 
we have that a e {0, 1}°' and (a)i 7^ if and only if r divides i. 

Let Zi — g{xi) e R'^ and let 

m 

f{z) = ^Xi9{z;Wi) (4) 
1=1 

where 9 is some basis and e R and Wi e R^ are linear and nonlinear model 
parameters. The selection of this particular model architecture is arbitrary, but 
does not alter the results. We assume that there exists some algorithm to select 

V = (m, Ai, A2, . . . , Xm, Wi,W2,..., Wk) such that = f{zi_i) - z., ~ iV(0, o) 
(or at the very least, ~ Zif' = is minimised). We do not consider 

the model selection problem here, rather we seek to find out what is the best 
choice of (a, d). Our own model selection work is summarised in [10]. 

The most obvious approach to this problem is to look for the maximum like- 
lihood solution: 

max max P{x\xn, a, d, V) 

{a,d) V VI"'''/ 

where x is the vector of all the time series observations and xq G R'^ is a 
vector of model initial conditions. Unfortunately this leads to the redundant 
solution d = N. To solve this problem one could either resort to Bayesian 

^ In writing g{xi) = (xj, . . . , Xi^rn-i) we take a slight liberty with the notation, 
but the meaning remains clear. 
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regularisation [11] or the minimum description length model selection criteria 
[12]. We choose the later approach. 

The description length of a time series is the length of the shortest (most 
compact) description of that time series. The description length of a time series 
with respect to a given model is the length of the description of that model, 
the initial conditions of that model and the model prediction error. We intend 
to optimise the description length of the observed time series {xi}fL^ = x 
with respect to {a,d). At this point we make the fairly cavalier assumption 
that for a given (a, d) one can obtain the optimal model V. We will address 
this assumption in more detail later in this section. 

The description length of the data DL{x) is given by 



DL{x) = DL{x\xo, a, d, V) + DL{xq) + DL{a, d) + DL{V) (5) 

where xq = {xi,X2, - ■ ■ ,Xd) are the model initial conditions. Notice that the 
description length of the model prediction errors DL{x\xo,a,d,V), is equal 
to the negative log likelihood of the errors under the assumed distributed. 
Similarly Xq is a sequence of d real numbers which for small d we approximate 
by d realisations of a random variable. Therefore DL{xq) can also be computed 
as a negative log-likelihood of some probability distribution. If we assume that 
X and Xq are approximated by Gaussian random variables with variance 
and (t|) respectively, then (5) becomes 



DL{x)^-\nP{x\N{id,a'^)) (6) 
-\nP{xQ\N{<d,a']^)) + d + DL{d) + DL{V). 

Since a is a sequence of d independent zeros or ones DL{a) = d, further- 
more the description length of an integer d is given by DL{d) = [log((i)] + 
[log[log(d)]] + . . . where the last term in this expansion is 0[12]. Compared 
to the term d, DL{d) is very slowly varying and has little effect on the results. 
The final term DL{V) is the description length of the optimal model for the 
given (a, d). 

Substituting for the probability distributions P{x\N{Q, a^)) and P(a:o| A^(0, (t\)) 
and estimating and a\ directly from the data, one finally obtains 



DL{x) f« ^ (l + In 271(7^) + (l + In 27r(7^) (7) 

+ DL{x) + d + DL{d) + DL{V) 
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(8) 



+ — (1 + In 27r) + d + DL{d) + DL(x) + DL{r). 



In this form, equation (8) provides the first suggestion of what the optimal em- 
bedding strategy should be. We see that a does not feature in this calculation. 
Hence, if we adopt the modeling paradigm suggested here, the embedding lag 
(or more generally the embedding strategy) is not crucial: one should only be 
concerned with the maximum embedding dimension d. Of course, this does 
not mean that to reconstruct the dynamics the embedding lag is unimportant. 
When one applies numerical modeling to reconstruct the dynamics, embed- 
ding strategies are of very great significance, however selection of the optimal 
embedding co-ordinates (or rather those that are most significant in predicting 
the dynamics) is inherently part of the modeling process [4]. Furthermore, the 
modelling algorithm should be allowed to choose from all possible embedding 
lags within the embedding window. Indeed, one often finds that the "optimal" 
embedding strategy is not fixed within a single model [4]. This result shows 
that it is preferable to identify the embedding window d^ and let the model 
building process determine which of the d^ co-ordinates are most useful. 

The description length of the mean of the data DL{x) is a fixed constant and 
we drop it from the calculation. Optimising (8) over all (a, d) requires selection 
of the optimal model for a given (a, d) and computation of the model prediction 
error of that model. For a given model, DL{V) can be calculated precisely [13]. 
However, selection of the optimal model is a more difficult problem. 

Instead, we restrict our attention to a particular class of model, and choose 
the optimal model from that class. To simplify the computation of (8) we 
restrict our attention to the class of local constant models on the attractor. We 
have two good reasons for choosing this particular class. Firstly, because the 
models are simple, estimates of the error as a function of (a, d) are relatively 
well behaved. Secondly, these models rely on no additional parameters and 
therefore DL{V) = 0, simplifying our calculation considerably^ . 

In trials, we tested many alternative model classes. We found radial basis 
functions [13] and neural networks [10] to be excessively nonlinear and diffi- 
cult to optimise for the purpose of determining embedding windows. Complex 
local modeling regimes such as triangulation and tessellation [14] or parameter 
dependent local linear schemes [15] we found to be overly sensitive to small 
changes in the data. In comparison the local constant scheme we employ here 
appears remarkably robust. 



^ Alternatively, one could argue that the data are the parameters, in either case 
the description length of the model is constant. 
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As local constant models have no explicit parameters (other than the embed- 
ding strategy (a, d)), DL(V) = 0. Therefore, for a given (a, d) computation of 
(8) only requires estimation of X^e?. We employ an in-sample local constant 
prediction strategy. Let Zg be the nearest neighbour to Zt (excluding Zt), then 



xt+i = Xs+i (9) 

and therefore Ct+i — Xt+i — Xs+i- In other words, for each point in the time 
series we determine the prediction error based on the difference between the 
successor to that point and the successor to its nearest neighbour^ . Since this 
is a form of interpolation rather than extrapolation, this strategy does not 
provide a predictive model, likewise (as with all local techniques) it does not 
describe the underlying dynamics. However, the strength of this particular 
approach is that it is simple and it provides a realistic estimate of the size of 
the optimal model's prediction error as a function of (a, d). 

The proposed algorithm may be summarised as follows. We seek to minimise 
(8) over d. To achieve this we need to estimate the model prediction error as 
a function of d. Hence, for increasing values of d we employ the local constant 
"modelling" scheme suggested by (9) to compute the model prediction error 
and substitute this into (8). The optimal embedding window dyj is the value 
of d that minimises (8). 



3 Examples 



In this section we describe the application of the above method to several 
numerical time series. First, we examine the performance of the algorithm 
and importance of the choice of modelling algorithm (9). 

The example we consider is 2000 points of the x component of a numerically 
integrated (sampling rate of 0.2) trajectory of the Rossler system, contam- 
inated by additive Gaussian noise with a standard deviation of 5% of the 
standard deviation of the data. The Rossler equations are given by 



X * 



y 



\ 



-y- z 
X + ay 
+ z{x — c) J 



(10) 



This is a technique sometimes referred to as "drop-one-out" interpolation. 
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Fig. 1. Computation of description length as a function of embedding 
window for Rossler time series data. The sohd Una and dot-dashed Une are 
proportional to the logarithm of the sum of the squares of the model prediction 
error using a local constant and local linear method respectively. The local constant 
model utilised is described in section 2, the local linear scheme is that described 
in [15]. For the second modelling scheme a clear minimum occurs at 4. The lo- 
cal constant modelling scheme employs only lags that provide an improvement to 
model prediction error and its error as a function of embedding window is therefore 
monotonic (plateau onset occurs at 34). For small values of embedding window the 
linear scheme performs best, but for large values, behaviour is poor and extremely 
erratic. Computation of description length utilising the local constant scheme (solid 
line with circles) yields an optimal embedding window of 15. For clarity, the values 
dw = 4, 15, 34 are marked as vertical dashed lines. 

where a = 0.398, b = 2 and c = 4. For these parameter values the data 
exhibits broad band chaos. Figure 1 demonstrates the computation of (8) as a 
function of embedding window. To estimate model prediction error we employ 
the rather simple interpolative scheme described in the previous section. For 
comparison, the performance of alternative (more complex) modelling schemes 
is also shown in figure 1. We find that alternative, more parametric, modelling 
methods produce results which are sensitively dependent on "correct" choice 
of modelling algorithm parameters ^ . 

The first zero of the autocorrelation function occurs at a lag of 8 and the data 
exhibits a pseudo period of about 31 samples. With the embedding lag set at 
8, false nearest neighbours indicates a minimum embedding dimension of 4. 
Standard methods, therefore, suggest an embedding window of roughly 32. 

By coincidence ^ , the minimum of the model prediction error for a constant 
model occurs at this value. Conversely, the minimum of the error of the local 



By modelling algorithm parameters we mean parameters associated with the 
model selection scheme itself rather than only the parameters optimised by that 
scheme. 

^ In other examples, and for other amounts of noise or with other lengths of data 
this proved not to be the case. 
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linear model occurs at a value of 4. This comparatively low value of embed- 
ding window is due to the relative complexity of the local linear modelling 
scheme [16]. Although this scheme performs best for small embedding win- 
dows, the additional information introduced with larger embedding windows 
is not recognised by this scheme. The main reason for this is that the pa- 
rameters of the scheme (neighbourhood size, neighbourhood weights and so 
on) are also dependent on the embedding dimension and embedding lag. For 
example, values of neighbourhood size which work well for a small dimension 
embedding may not work well for larger embedding dimension. Moreover, as 
embedding dimension becomes larger it becomes difficult to find good values 
for these parameters.. This general behaviour is observed in every example we 
consider. Therefore, although the local linear scheme often provides a good 
estimate of the optimal embedding dimension (as would false nearest neigh- 
bours) , the description length estimated from a local constant model provides 
a much better estimate of the optimal embedding window. 

We have already mentioned that the local constant modelling scheme selects 
only lags that provide some improvement in model prediction error. Clearly, as 
dw increases there is a combinatorial explosion. To address this combinatorial 
explosion is both difficult and beyond the requirements of this algorithm. We 
consider only whether the addition of successive lags offers an improvement. 
Suppose for a dg dimensional embedding the chosen model includes the lags 
{ii,£2, ■ ■ ■ , ^k} (where < ii < ii < < ik < dg). To determine the set of 
model lags for the ((ie-l-l)-dimensional embedding we consider the performance 
of the local constant model with lags {^i, ^2, ■ ■ ■ Aki de}- If this model performs 
better than the model with lags {£1, ^2, ■ ■ ■ , (-k} then it is accepted, otherwise 
we retain only the lags {^1, ^2, • • • , ^k}- 

Therefore, the selected lags may be used as an estimate of the optimal lags 
for a generalised variable embedding (1). In the case of the Rossler system 
data analysed in figure 1, the optimal lags were 1 to 15 and 19, 20, 24, 26, 29, 
32 and 34. Altogether, 22 different lags. Clearly, a 22 dimensional embedding 
is excessive, and some subset of these lags would probably prove sufficient. 
Moreover, the minimum description length optimal embedding window is 15, 
hmiting the selection to the first 15 lags. It is reasonable to suppose that each 
of these large number of lags may contribute some significant novel information 
to the modelling scheme. However, the expression we hope to optimise (8) is 
independent of which lags are included (indeed, in this example, they are all 
included^ ), and therefore we do not consider this issue more closely here. We 
defer the selection of optimal lags from this set for the modelling phase of 
dynamic reconstruction. 

In figure 2 we examine the effect of various noise levels and different length 



^ This is not the case in general. 
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Fig. 2. Optimal embedding dimension as a function of data length and 
noise level The solid bars depict the optimal model size utilising the methods 
described in this paper for a single realisation of Rossler time series data. The panel 
on the left is for a fixed noise level of 5% and time series length between 118 and 
5000 data. The panel on the right is for fixed data length of 2108 data and various 
noise levels (expressed as percentage of the standard deviation of the data) . For the 
cases where noise was added to the time series, the results depicted here are for 
a single realisation of that noise (not an average). This is the likely cause of the 
moderate variation in the results observed for larger noise levels. For comparison, 
the embedding window that yielded minimum error for the local constant (asterisks) 
and local linear (circles) models is also shown. 

time series on the selection of embedding window. We observe that for longer 
time series, the optimal embedding window is larger. This is consistent with 
what one might expect. For short time series the optimal model can only 
capture the short term dynamics and therefore only recent past history (a 
small embedding window) is required. For larger quantities of data one is able 
to characterise the more sensitive long term dynamics and a larger embed- 
ding window provides significant advantage. Initially, an embedding window 
of about 10 is sufficient, while for the longest time series an embedding window 
of 35 is optimal. Significantly, these two values correspond to approximately 
the first zero of the autocorrelation function (or one-quarter of the pseudo- 
period) and the pseudo-period of the observed time series. 

We note in passing, that, the optimal embedding window for the local con- 
stant window is an upper bound on the minimum description length best 
window. This is as we would expect. The description length is the sum of a 
term proportional to the model prediction error and a function which increase 
monotonically with embedding dimension (the description length of the local 
constant model). Therefore the minimum of the model prediction error must 
be no less than the minimum of the description length. 

Conversely, we find that the optimal embedding window for the local linear 
method remains about 4 or 5 (roughly corresponding to the optimal embedding 



10 



Lorenz: embedding window; fixed 5% noise 

20 I ' ; 




Ql 1 

4 5 6 7 8 9 

log(time series length (n)) 



Lorenz: Optimal embedding window; fixed 2108 data points 
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Ikeda: embedding window; fixed 5% noise 
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Fig. 3. Optimal embedding windows for Lorenz and Ikeda time series. The 

calculations depicted in figure 2 are repeated for time series of two standard systems. 
The top two panels are for a single realisation of the chaotic Lorenz system, the 
bottom two panels are for a single realisation of the chaotic Ikeda map. The solid 
bars depict the optimal model size utilising the methods described in this paper. 
The leftmost panels are for a fixed noise level of 5% and time series length between 
118 and 5000 data. The panels on the right are for fixed data length of 2108 data 
and various noise levels (expressed as percentage of the standard deviation of the 
data). This is the likely cause of the moderate variation in the results observed for 
larger noise levels. For comparison, the embedding window that yielded minimum 
error for the local constant (asterisks) and local linear (circles) models is also shown. 

dimension) . 

Variation in the noise level for a fixed length time series demonstrates simi- 
lar behaviour. For noisier time series a larger embedding window is required, 
as increasing the noise on each observation decreases the useful information 
provided. As the information provided to the optimal model by each obser- 
vation decreases, more observations (a larger embedding window) is required 
to provide all the available information. For noise levels of up to 30% this 
method provides consistent, repeatable, results. Noisier time series tend to 
yield a larger variation in the optimal estimates of embedding window. Note 
that in contrast, the local linear scheme performs progressively worse, utilising 
a diminishing window as the noise level is increased. We believe that this is due 
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model 


MDL 


RMS 


size 


Standard (4 = 4, r = 8) 


-655 ± 23 


0.158 ± 0.003 


15.6 ±2.9 


Windowed (d^ = 15) 


-716 ±17 


0.151 ±0.004 


21.1 ±5.5 



Table 1 

Comparison of model performance with standard constant lag embedding (a Stan- 
dard Embedding) and embedding over the embedding window suggested in figure 

2 (a Windowed Embedding). Figures quoted are the mean of 60 nonlinear models, 
fitted with a stochastic optimisation routine to the same data set, and standard de- 
viations. Figures quoted here are for 2000 data points with 5% noise, other values of 
these parameters gave similar, consistent, results. The three indicators are minimum 
description length (MDL) of the optimal model, root-mean-square model prediction 
error (RMS) and the model size (number of nonlinear terms in the optimal model). 
For each indicator, the new embedding strategy shows clear improvement. MDL 
and RMS have decreased, indicating a more compact description of the data and a 
smaller prediction error, respectively. Conversely, the mean model size has increased 
indicating that more structure is extracted from the data. Several other measures 
were also considered: mean amplitude of oscillation, correlation dimension, entropy 
and estimated noise level [18]. However, for each of these measures the variance be- 
tween simulations of models built using the same embedding strategy was as large 
as that between the different embedding strategies. The results of these calculations 
are therefore omitted. 

to the additional parametric complexity of this modelling method. As more 
noise is added to the data, the (relatively) complex rules used to determine 

near neighbours and derive a weighted linear prediction from these, becomes 
more prone to the system noise, and actually performs worse. 

In figure 3 we repeat the above calculations for time series generated from the 
standard chaotic Lorenz system and the Ikeda map [17]. Variation of optimal 
embedding window as a function of noise and data length for the Lorenz data is 
very similar to the results depicted in figure 2 for the Rossler system. Increasing 
noise level or time series length yields a larger optimal model. Furthermore, 
optimal embedding window values tend to coincide with the pseudo-period of 
the time series, or one-quarter, or one-half of this value. 

Results for the Ikeda map are substantially different. In this case the optimal 
embedding window estimated coincides with the value that minimises the error 
of the local constant and linear models. In general, an embedding dimension 
of 3 or 4 is suggested, and this is what one would expect for this system . 

We now return to the main purpose of estimating the embedding window, 
namely the reconstruction of the dynamics. For the Rossler system analysed 

Although the fractal dimension of the Ikeda map is less than two, a delay recon- 
struction of this map is highly "twisted" and requires an embedding dimension of 

3 or 4 to successfully remove all intersecting trajectories. 
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Fig. 4. Experimental time series data The three experimental time series ex- 
amined in this paper are depicted (from top to bottom): annual sunspot numbers 
for the period 1700 to 2000, a recording of human electrocardiogram rhythm during 
ventricular fibrillation, and the chaotic "Santa-Fe" laser times series. For the lower 
two panels only the first 2000 points are utilised for time series modelling. 

in figures 1 and 2 we build nonlinear models following the methods described in 
[4] with embedding suggest by either autocorrelation and false nearest neigh- 
bours (namely dg = 4 and r = 8), hereafter referred to as a Standard Em- 
bedding, or with the embedding window (of 34), hereafter a Windowed Em- 
bedding. Table 1 compares the average model size (number of nonlinear basis 
functions in the optimal model) and model prediction error for 60 models of 
this time scries (2000 observations and 5% noise) with each of these two cm- 
bedding strategies. These models are built to minimise the description length 
of the data given the model, and therefore a comparison of the optimal model 
description length is also given. These qualitative measures show a consistent 
improvement in the model performance for the model built from the windowed 
embedding. 



4 Applications 

We now consider the application of this method to three experimental time 
series: the annual sunspots times series [19], human electrocardiogram (ECG) 
recordings of ventricular fibrillation (VP) [20,21], and experimental laser in- 
tensity data [22,23]. The raw time series data are depicted in figure 4. 

Since the main motivation for selection of embedding window with the method 
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data 




HiVlD 


size 




sunspots 

(de = 6, r = 3) 

/J c\ 

[dw — Dj 


1267.9 ± 12.1 

1 OQH 1 1 1 1 

iZoyJ.L ± 11. D 


13.16 ± 1.116 
Iz.ol zL U.OooD 


7.32 ± 1.818 
D.yo =t i.ou 


0.938 ± 0.456 
U. / ooD =t U.4140 


VF ECG 

{de = 5, T = 8) 

(d — 2) 


-14333 ± 28 
— 141fi3 + 14 


0.02468 ± 0.0003846 
n 02751 + 0001500 


31.47 ±7.326 
1 K S8 + 2 66 


1.003 ± 0.099 
1 1 24 ± 1 07 


laser 

{de = 5, r = 2) 
{d^ = 10) 


5753.6 ± 153.9 
5239.8 ± 159.0 


2.405 ± 0.2954 
1.767 ±0.1992 


100.8 ± 12.3 
109.5 ± 12.3 


n/a 

0.8637 ± 0.7999 



Table 2 

Comparison of model performance with standard constant lag embedding and em- 
bedding over the embedding window suggested in figure 2. Figures quoted are the 
mean of 60 nonlinear models, fitted with a stochastic optimisation routine to the 
same data set, and standard deviations. Figures quoted here are for 2000 data points, 
where more data is available, longer time series samples gave similar, consistent, re- 
sults. The four indicators are minimum description length (MDL) of the optimal 
model, root-mean-square model prediction error (RMS), the model size (number 
of nonlinear terms in the optimal model), and the correlation dimension (CD) of 
the free run dynamics. For the laser time series, none of the models built using the 
standard embedding produced stable dynamics and it was therefore not possible to 
estimate correlation dimension. The correlation dimension estimated directly from 
these three data sets was 0.396, 1.090, 1.182 (note that the low value for the first 
data set is an artifact of the short time series). 

described in this paper is to improve modelling results we concentrate exclu- 
sively on the comparison of the performance of nonlinear models of this data 
with standard embedding techniques and the windowed embedding suggested 
by the algorithm proposed here. By construction, the local constant modelling 
scheme performs best with the windowed embedding. Therefore, we consider 
a more complicated nonlinear radial basis modelling algorithm, first proposed 
in [13] and most recently described in [10]. Like the windowed embedding 
strategy, this modelling scheme is designed to optimise the description length 
of the time series [10]. 

We are interested in two types of measures of performance: short term be- 
haviour (for example mean square prediction error) and dynamic behaviour 
(invariant measures of the dynamical systems). Results equivalent to those 
depicted in table 1 have also been computed and are summarised in table 2. 

Table 2 shows that for the sunspot time series and the experimental laser 
intensity recording, the windowed embedding improved model performance. 
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Fig. 5. Typical model behaviour using the standard embedding strategy 
{dg, t): Three simulated time series from models of the experimental data examined 
in this paper are depicted. The panels correspond to those of figure 4 and the 
horizontal and vertical axes in these figures are fixed to be the same values as the 
corresponding panels of figure 4. 

That is, the description length was lower, the one-step model prediction error 
was less and the models were larger. However, with the exception of one step 
model prediction error the difference in these measures was not statistically 
significant. For the recording of human VF the new method did not improve 
model performance and, in fact, the optimal embedding window was — 2: 
substantially smaller than one would reasonably expect from such a complex 
biological system. It seems plausible, that in this case, the time series under 
consideration is too short, noisy or non-stationary (this conclusion is supported 
by figure 4). Finally, we note that the result for the sunspot time series is 
particularly encouraging because this improvement in short term predictability 
is achieved with a much smaller embedding {d^ — 6 compared to dgT — 18). 

However, as has been observed elsewhere [10], short term predictability is 
not the best criteria with which to compare models of nonlinear dynamical 
systems. Therefore, for each model we estimated correlation dimension, noise 
level and entropy, using a method described in [18]. Furthermore, under the 
premise that these models should exhibit pseudo-periodic dynamics we also 
computed mean limit cycle diameter (i.e. the amplitude of the limit cycle 
oscillations. In every case we found that the dynamics exhibited by models 
built from the traditional (i.e. uniform) embedding strategy was more likely 
to either be a stable fixed point or divergent. 

Finally, figures 5 and 6 show typical noise free dynamics in models of each of 
these three systems. No effort was made to ensure that the models performed 
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Fig. 6. Typical model behaviour using the windowed embedding strategy 
(dw)'- Three simulated time series from the experimental data examined in this 
paper are depicted. The panels correspond to those of figure 4 and the horizontal 
and vertical axes in these figures are fixed to be the same values as the corresponding 
panels of figure 4. 
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Fig. 7. Observed electrocardiogram trace after resuscitation: Following the 
episode of VF depicted in figure 4, electrical resuscitation was successfully applied. 
The waveform depicted in this figure was observed 120 seconds after application of 
defibrillation. Notice that the waveform observed here is similar to the asymptotic 
behaviour of the model depicted in figure 5. Both the period and the shape of these 
time series are similar. 

well and the models and simulations presented in these figures were selected 
at random. For the sunspots and laser dynamics (top and bottom panels) the 
new method clearly performs better. Typically, the original method produced 
laser dynamics and sunspots simulations that were divergent and a stable 
fixed point (respectively). These results are typical. In contrast the windowed 
method yields models which exhibit bounded (almost) aperiodic dynamics . 



Closer examination of the laser dynamics indicates that it eventually settles to a 
stable periodic orbit (this phenomenon can be observed toward the end of the time 
series depicted in figure 6. 
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Using the windowed embedding we found that the long term dynamics for 
models of the VF data performed badly. However, this is to be expected as the 
optimal embedding window was 2. For this data none of the models produced 
with either method performed well. Hence, with this short, noisy and non- 
stationary data the models (built with either embedding strategy) failed to 
capture the underlying dynamics. Significantly, the estimate of provided by 
our algorithm indicated that this would be the case. We found that the optimal 
embedding is only two dimensional and this suggests that the best thing to 
do is to not build a model of the dynamics at all. Hence, even this negative 
result is encouraging: we have established the limitations of this algorithm and 
found that even in this situation the results are consistent. We note with some 
curiosity that the dynamics exhibited in the second panel of figure 5 closely 
resembles the human electrocardiogram during regular rhythm (see figure 7 for 
a representative recording) , despite the data being recorded during VF — we 
have no explanation for this. This behaviour, although exhibited by this one 
model was not present in simulations from all similar models. It is suggestive 
that the VF waveform includes information concerning the underlying (slower) 
sinus rhythm dynamics. But the evidence for this is definitely not conclusive. 



5 Conclusions 

We have approached the problem of optimal embedding from a modelling per- 
spective. In contrast to previous studies (which focused on estimating dynamic 
invariants) our primary concern was selection of embedding parameters that 
provide the optimal reconstruction of the underlying dynamics for an observed 
time series. To achieve this we assumed that the optimal model is that which 
minimises the description length the data. From this foundation we showed 
that the best embedding has a constant lag (r = 1) and a relatively large 
embedding window dyj. In general the optimal dy, will be determined by the 
amount of noise and the length of the the time series. From an information 
theoretic perspective this is what one would expect: r > 1 implies some in- 
formation is missing from the embedding. The optimal value of dy^ refiects a 
balance between a small embedding with two little information to reconstruct 
the dynamics and a large embedding where the model ceases to describe the 
dynamics. 

To compute the quantity dy, we introduced an extremely simple non-predictive 
local constant model of the data and select the value of dyj for which this model 
performs best. One can see that this offers a new and intuitive method for 
selection of embedding parameters. In essence, one could neglect description 
length and simply choose the embedding such that this model performs best. 
However, the addition of description length makes the optimal d^ dependent 
not only on the noise but also on the length of the time series. We see that 
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for short time series one shouldn't be confident of a large embedding window. 

The similarity between this new method of embedding window selection and 
the well estabhshed false nearest neighbour technique [24] is more than super- 
ficial . In section 3 and 4 we provided an explicit comparison to between our 
technique and the "standard" false nearest neighbour method. However, there 
are various improvements to this algorithm (such as [24]) which are worthy 
of further consideration. Nonetheless, there are several important distinctions 
between our method, and these false nearest neighbour techniques. As we have 
already emphasised, the aim of this method (to achieve the best model of the 
dynamics) differs from that of false nearest neighbours (topological unfold- 
ing). Furthermore, the incorporation of minimum description length means 
that our method explicitly penalises for short or noisy time series. 

At a functional level, the two algorithms are similar because both methods seek 
to avoid data points which are close, but which quickly diverge. Such points 
are (respectively) either false nearest neighbour of bad nonlinear predictors of 
one another. However, where as false nearest neighbour methods seek only to 
avoid this situation (i.e. spreading out the data is sufficient), the windowed 
embedding method insists that the neighbours which are the best predictors 
be found. 

Consider the situation where a systems' dynamics are either stochastic or ex- 
tremely high dimensional. Using false nearest neighbour methods, one may 
simply embed the data in a high enough dimension so that the data are suf- 
ficiently sparse. However, doing so docs not improve the nonlinear prediction 
error, consequently, the windowed embedding method would prefer a small 
embedding window. 

Conversely, consider the situation at a seperatrix. Points which are close do 
rapidly diverge from one another and so they will appear as false near neigh- 
bours for large embedding dimension, until (at a time scale similar to that of 
the underlying system) the points arc eventually, sufficiently spread). But from 
a nonlinear prediction view-point, these points are equally difficult to predict 
for all embedding dimension, and again the windowed embedding method will 
indicate a much smaller embedding dimension than that suggested by a strict 
application of false nearest neighbours . 



-"^^ The comparison of this method to that described in [24] is particularly apt. Cao 
introduces a modified false nearest neighbour approach which, like our method, 
avoids many of the subjective parameters of alternative techniques. 
^'^ We acknowledge that this problem is actually related to the "plateau" observed 
in plots of the fraction of false nearest neighbours against embedding dimension. 
In many cases, prudent selection of "plateau-onset" can minimise the problem. 
However, this remains somewhat subjective. 
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Finally, we note that the examples of section 3 showed that this method per- 
formed consistently and the apphcations in section 4 showed that selecting 
embedding parameters in this way improved the model one-step prediction er- 
ror. In effect, this is a demonstration that the method is working as expected. 
More significantly, we found that the dynamics produced by models built from 
windowed embedding also behaved more like the experimental dynamics than 
for models built from a standard embedding. This is a very positive results, 
however, we are now faced with a more substantial problem. The problem of 
building the best nonlinear model for the data once the embedding window 
has been determined [10]. Information theory has shown us that the optimal 
embedding should fix r = 1, we now need to consider the practice of non- 
linear modelling to determine which lags I = 1,2,3,. are significant for 
practical reconstruction from specific experimental systems. 
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